Towards Semantic Web Information Extraction
نویسندگان
چکیده
The approach towards Semantic Web Information Extraction (IE) presented here is implemented in KIM – a platform for semantic indexing, annotation, and retrieval. It combines IE based on the mature text engineering platform (GATE1) with Semantic Web-compliant knowledge representation and management. The cornerstone is automatic generation of named-entity (NE) annotations with class and instance references to a semantic repository. Simplistic upper-level ontology, providing detailed coverage of the most popular entity types (Person, Organization, Location, etc.; more than 250 classes) is designed and used. A knowledge base (KB) with de-facto exhaustive coverage of real-world entities of general importance is maintained, used, and constantly enriched. Extensions of the ontology and KB take care of handling all the lexical resources used for IE, most notable, instead of gazetteer lists, aliases of specific entities are kept together with them in the KB. A Semantic Gazetteer uses the KB to generate lookup annotations. Ontologyaware pattern-matching grammars allow precise class information to be handled via rules at the optimal level of generality. The grammars are used to recognize NE, with class and instance information referring to the KIM ontology and KB. Recognition of identity relations between the entities is used to unify their references to the KB. Based on the recognized NE, template relation construction is performed via grammar rules. As a result of the latter, the KB is being enriched with the recognized relations between entities. At the final phase of the IE process, previously unknown aliases and entities are being added to the KB with their specific types.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملTowards Cross-Media Feature Extraction
In this paper we describe past and present work dealing with the use of textual resources, out of which semantic information can be extracted in order to provide for semantic annotation and indexing of associated image or video material. Since the emergence of semantic web technologies and resources, entities, relations and events extracted from textual resources by means of Information Extract...
متن کاملFrench-Written Event Extraction Based on Contextual Exploration
Event extraction is a significant task in information extraction. This importance increases more and more with the explosion of textual data available on the Web, the appearance of Web 2.0 and the tendency towards the Semantic Web. Thus, we propose a generic approach to extract events from text and to analyze them. We propose an event extraction algorithm with a polynomial complexity O(n), and ...
متن کاملTowards Knowledge Acquisition from Information Extraction
In our research to use information extraction to help populate the semantic web, we have encountered significant obstacles to interoperability between the technologies. We believe these obstacles to be endemic to the basic paradigms, and not quirks of the specific implementations we have worked with. In particular, we identify five dimensions of interoperability that must be addressed to succes...
متن کاملAn Overview of the Semantic Web Improving Web Data Accessibility and Performance
The Internet has known a very fast evolution, going from the Web 1.0, i.e., the traditional Web where users are merely consumers of static information, to the more dynamic Web 2.0, known as the Social or Collaborative Web, where users produce and consume information simultaneously, and heading toward the more sophisticated and eagerly anticipated Web 3.0, better known as the Semantic Web: exten...
متن کاملTowards the semantic web in e-tourism: can annotation do the trick?
Semantic Web technology may support more advanced E-Commerce. Namely the representation of products and services in the form of ontologies will simplify the automated extraction and processing of explicit information and will make implicit information available for the discovery and comparison of offerings. One common assumption is that the Semantic Web can be made a reality by gradually augmen...
متن کامل